Dataset statistics
| Number of variables | 21 |
|---|---|
| Number of observations | 1017209 |
| Missing cells | 2173431 |
| Missing cells (%) | 10.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 163.0 MiB |
| Average record size in memory | 168.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 10 |
Date has a high cardinality: 942 distinct values | High cardinality |
DayOfWeek is highly correlated with Open | High correlation |
Sales is highly correlated with Customers and 1 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with DayOfWeek and 2 other fields | High correlation |
DayOfWeek is highly correlated with Open | High correlation |
Sales is highly correlated with Customers and 1 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with DayOfWeek and 2 other fields | High correlation |
Sales is highly correlated with Customers and 1 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with Sales and 1 other fields | High correlation |
Assortment is highly correlated with StoreType | High correlation |
PromoInterval is highly correlated with Promo2 | High correlation |
StoreType is highly correlated with Assortment | High correlation |
Promo2 is highly correlated with PromoInterval | High correlation |
DayOfWeek is highly correlated with Open | High correlation |
Sales is highly correlated with Customers and 2 other fields | High correlation |
Customers is highly correlated with Sales and 1 other fields | High correlation |
Open is highly correlated with DayOfWeek and 2 other fields | High correlation |
Promo is highly correlated with Sales | High correlation |
StateHoliday is highly correlated with Open | High correlation |
StoreType is highly correlated with Assortment | High correlation |
Assortment is highly correlated with Customers and 1 other fields | High correlation |
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeek | High correlation |
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fields | High correlation |
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fields | High correlation |
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fields | High correlation |
CompetitionOpenSinceMonth has 323348 (31.8%) missing values | Missing |
CompetitionOpenSinceYear has 323348 (31.8%) missing values | Missing |
Promo2SinceWeek has 508031 (49.9%) missing values | Missing |
Promo2SinceYear has 508031 (49.9%) missing values | Missing |
PromoInterval has 508031 (49.9%) missing values | Missing |
DayOfWeek has 144730 (14.2%) zeros | Zeros |
Sales has 172871 (17.0%) zeros | Zeros |
Customers has 172869 (17.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-05-24 07:45:29.954613 |
|---|---|
| Analysis finished | 2022-05-24 07:46:52.223032 |
| Duration | 1 minute and 22.27 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
Store
Real number (ℝ≥0)
| Distinct | 1115 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 558.4297268 |
| Minimum | 1 |
|---|---|
| Maximum | 1115 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 56 |
| Q1 | 280 |
| median | 558 |
| Q3 | 838 |
| 95-th percentile | 1060 |
| Maximum | 1115 |
| Range | 1114 |
| Interquartile range (IQR) | 558 |
Descriptive statistics
| Standard deviation | 321.9086511 |
|---|---|
| Coefficient of variation (CV) | 0.5764532862 |
| Kurtosis | -1.200523741 |
| Mean | 558.4297268 |
| Median Absolute Deviation (MAD) | 279 |
| Skewness | -0.000954879981 |
| Sum | 568039744 |
| Variance | 103625.1797 |
| Monotonicity | Increasing |
| Value | Count | Frequency (%) |
| 1 | 942 | 0.1% |
| 726 | 942 | 0.1% |
| 708 | 942 | 0.1% |
| 709 | 942 | 0.1% |
| 713 | 942 | 0.1% |
| 714 | 942 | 0.1% |
| 715 | 942 | 0.1% |
| 717 | 942 | 0.1% |
| 718 | 942 | 0.1% |
| 720 | 942 | 0.1% |
| Other values (1105) | 1007789 |
| Value | Count | Frequency (%) |
| 1 | 942 | |
| 2 | 942 | |
| 3 | 942 | |
| 4 | 942 | |
| 5 | 942 | |
| 6 | 942 | |
| 7 | 942 | |
| 8 | 942 | |
| 9 | 942 | |
| 10 | 942 |
| Value | Count | Frequency (%) |
| 1115 | 942 | |
| 1114 | 942 | |
| 1113 | 942 | |
| 1112 | 942 | |
| 1111 | 942 | |
| 1110 | 942 | |
| 1109 | 758 | |
| 1108 | 942 | |
| 1107 | 758 | |
| 1106 | 942 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.998340557 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 144730 |
| Zeros (%) | 14.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.997390965 |
|---|---|
| Coefficient of variation (CV) | 0.6661654761 |
| Kurtosis | -1.246873339 |
| Mean | 2.998340557 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.001592822804 |
| Sum | 3049939 |
| Variance | 3.989570667 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 145845 | |
| 3 | 145845 | |
| 2 | 145665 | |
| 1 | 145664 | |
| 0 | 144730 | |
| 6 | 144730 | |
| 5 | 144730 |
| Value | Count | Frequency (%) |
| 0 | 144730 | |
| 1 | 145664 | |
| 2 | 145665 | |
| 3 | 145845 | |
| 4 | 145845 | |
| 5 | 144730 | |
| 6 | 144730 |
| Value | Count | Frequency (%) |
| 6 | 144730 | |
| 5 | 144730 | |
| 4 | 145845 | |
| 3 | 145845 | |
| 2 | 145665 | |
| 1 | 145664 | |
| 0 | 144730 |
| Distinct | 942 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 2015-07-31 | 1115 |
|---|---|
| 2013-11-06 | 1115 |
| 2013-11-18 | 1115 |
| 2013-11-17 | 1115 |
| 2013-11-16 | 1115 |
| Other values (937) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 10172090 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2015-07-31 |
|---|---|
| 2nd row | 2015-07-30 |
| 3rd row | 2015-07-29 |
| 4th row | 2015-07-28 |
| 5th row | 2015-07-27 |
Common Values
| Value | Count | Frequency (%) |
| 2015-07-31 | 1115 | 0.1% |
| 2013-11-06 | 1115 | 0.1% |
| 2013-11-18 | 1115 | 0.1% |
| 2013-11-17 | 1115 | 0.1% |
| 2013-11-16 | 1115 | 0.1% |
| 2013-11-15 | 1115 | 0.1% |
| 2013-11-14 | 1115 | 0.1% |
| 2013-11-13 | 1115 | 0.1% |
| 2013-11-12 | 1115 | 0.1% |
| 2013-11-11 | 1115 | 0.1% |
| Other values (932) | 1006059 |
Length
| Value | Count | Frequency (%) |
| 2015-07-31 | 1115 | 0.1% |
| 2015-05-13 | 1115 | 0.1% |
| 2015-03-22 | 1115 | 0.1% |
| 2015-03-23 | 1115 | 0.1% |
| 2015-03-24 | 1115 | 0.1% |
| 2015-03-25 | 1115 | 0.1% |
| 2015-03-26 | 1115 | 0.1% |
| 2015-03-27 | 1115 | 0.1% |
| 2015-04-15 | 1115 | 0.1% |
| 2015-04-17 | 1115 | 0.1% |
| Other values (932) | 1006059 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2307842 | |
| - | 2034418 | |
| 1 | 1825657 | |
| 2 | 1606379 | |
| 3 | 660614 | 6.5% |
| 4 | 574660 | 5.6% |
| 5 | 440530 | 4.3% |
| 6 | 200805 | 2.0% |
| 7 | 198570 | 2.0% |
| 8 | 164005 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8137672 | |
| Dash Punctuation | 2034418 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2307842 | |
| 1 | 1825657 | |
| 2 | 1606379 | |
| 3 | 660614 | 8.1% |
| 4 | 574660 | 7.1% |
| 5 | 440530 | 5.4% |
| 6 | 200805 | 2.5% |
| 7 | 198570 | 2.4% |
| 8 | 164005 | 2.0% |
| 9 | 158610 | 1.9% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2034418 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10172090 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 2307842 | |
| - | 2034418 | |
| 1 | 1825657 | |
| 2 | 1606379 | |
| 3 | 660614 | 6.5% |
| 4 | 574660 | 5.6% |
| 5 | 440530 | 4.3% |
| 6 | 200805 | 2.0% |
| 7 | 198570 | 2.0% |
| 8 | 164005 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10172090 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 2307842 | |
| - | 2034418 | |
| 1 | 1825657 | |
| 2 | 1606379 | |
| 3 | 660614 | 6.5% |
| 4 | 574660 | 5.6% |
| 5 | 440530 | 4.3% |
| 6 | 200805 | 2.0% |
| 7 | 198570 | 2.0% |
| 8 | 164005 | 1.6% |
| Distinct | 21734 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5773.818972 |
| Minimum | 0 |
|---|---|
| Maximum | 41551 |
| Zeros | 172871 |
| Zeros (%) | 17.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3727 |
| median | 5744 |
| Q3 | 7856 |
| 95-th percentile | 12137 |
| Maximum | 41551 |
| Range | 41551 |
| Interquartile range (IQR) | 4129 |
Descriptive statistics
| Standard deviation | 3849.926175 |
|---|---|
| Coefficient of variation (CV) | 0.6667902464 |
| Kurtosis | 1.778374747 |
| Mean | 5773.818972 |
| Median Absolute Deviation (MAD) | 2067 |
| Skewness | 0.6414596158 |
| Sum | 5873180623 |
| Variance | 14821931.55 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 172871 | 17.0% |
| 5674 | 215 | < 0.1% |
| 5558 | 197 | < 0.1% |
| 5483 | 196 | < 0.1% |
| 6214 | 195 | < 0.1% |
| 6049 | 195 | < 0.1% |
| 5723 | 194 | < 0.1% |
| 5449 | 192 | < 0.1% |
| 5140 | 191 | < 0.1% |
| 5489 | 191 | < 0.1% |
| Other values (21724) | 842572 |
| Value | Count | Frequency (%) |
| 0 | 172871 | |
| 46 | 1 | < 0.1% |
| 124 | 1 | < 0.1% |
| 133 | 1 | < 0.1% |
| 286 | 1 | < 0.1% |
| 297 | 1 | < 0.1% |
| 316 | 1 | < 0.1% |
| 416 | 1 | < 0.1% |
| 506 | 1 | < 0.1% |
| 520 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 41551 | 1 | |
| 38722 | 1 | |
| 38484 | 1 | |
| 38367 | 1 | |
| 38037 | 1 | |
| 38025 | 1 | |
| 37646 | 1 | |
| 37403 | 1 | |
| 37376 | 1 | |
| 37122 | 1 |
Customers
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 4086 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 633.1459464 |
| Minimum | 0 |
|---|---|
| Maximum | 7388 |
| Zeros | 172869 |
| Zeros (%) | 17.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 405 |
| median | 609 |
| Q3 | 837 |
| 95-th percentile | 1362 |
| Maximum | 7388 |
| Range | 7388 |
| Interquartile range (IQR) | 432 |
Descriptive statistics
| Standard deviation | 464.4117339 |
|---|---|
| Coefficient of variation (CV) | 0.7334987083 |
| Kurtosis | 7.091772718 |
| Mean | 633.1459464 |
| Median Absolute Deviation (MAD) | 216 |
| Skewness | 1.59865029 |
| Sum | 644041755 |
| Variance | 215678.2586 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 172869 | 17.0% |
| 560 | 2414 | 0.2% |
| 576 | 2363 | 0.2% |
| 603 | 2337 | 0.2% |
| 571 | 2330 | 0.2% |
| 555 | 2328 | 0.2% |
| 566 | 2327 | 0.2% |
| 517 | 2326 | 0.2% |
| 539 | 2309 | 0.2% |
| 651 | 2299 | 0.2% |
| Other values (4076) | 823307 |
| Value | Count | Frequency (%) |
| 0 | 172869 | |
| 3 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 18 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 44 | 1 | < 0.1% |
| 50 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 7388 | 1 | |
| 5494 | 1 | |
| 5458 | 1 | |
| 5387 | 1 | |
| 5297 | 1 | |
| 5192 | 1 | |
| 5152 | 1 | |
| 5145 | 1 | |
| 5132 | 1 | |
| 5112 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1017209 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1017209 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 844392 | |
| 0 | 172817 | 17.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1017209 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1017209 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 629129 | |
| 1 | 388080 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 0 | |
|---|---|
| a | 20260 |
| b | 6690 |
| c | 4100 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 986159 | |
| a | 20260 | 2.0% |
| b | 6690 | 0.7% |
| c | 4100 | 0.4% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 986159 | |
| a | 20260 | 2.0% |
| b | 6690 | 0.7% |
| c | 4100 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 986159 | |
| a | 20260 | 2.0% |
| b | 6690 | 0.7% |
| c | 4100 | 0.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 986159 | |
| Lowercase Letter | 31050 | 3.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 20260 | |
| b | 6690 | 21.5% |
| c | 4100 | 13.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 986159 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 986159 | |
| Latin | 31050 | 3.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 20260 | |
| b | 6690 | 21.5% |
| c | 4100 | 13.2% |
Common
| Value | Count | Frequency (%) |
| 0 | 986159 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 986159 | |
| a | 20260 | 2.0% |
| b | 6690 | 0.7% |
| c | 4100 | 0.4% |
SchoolHoliday
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1017209 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1017209 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 835488 | |
| 1 | 181721 | 17.9% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| a | |
|---|---|
| d | |
| c | |
| b | 15830 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | c |
|---|---|
| 2nd row | c |
| 3rd row | c |
| 4th row | c |
| 5th row | c |
Common Values
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1017209 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1017209 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 551627 | |
| d | 312912 | |
| c | 136840 | 13.5% |
| b | 15830 | 1.6% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| a | |
|---|---|
| c | |
| b | 8294 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | a |
|---|---|
| 2nd row | a |
| 3rd row | a |
| 4th row | a |
| 5th row | a |
Common Values
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1017209 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1017209 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 537445 | |
| c | 471470 | |
| b | 8294 | 0.8% |
CompetitionDistance
Real number (ℝ≥0)
| Distinct | 654 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 2642 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5430.085652 |
| Minimum | 20 |
|---|---|
| Maximum | 75860 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 130 |
| Q1 | 710 |
| median | 2330 |
| Q3 | 6890 |
| 95-th percentile | 20390 |
| Maximum | 75860 |
| Range | 75840 |
| Interquartile range (IQR) | 6180 |
Descriptive statistics
| Standard deviation | 7715.3237 |
|---|---|
| Coefficient of variation (CV) | 1.420847514 |
| Kurtosis | 13.00002236 |
| Mean | 5430.085652 |
| Median Absolute Deviation (MAD) | 1980 |
| Skewness | 2.928534017 |
| Sum | 5509185710 |
| Variance | 59526219.8 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 250 | 11120 | 1.1% |
| 50 | 7536 | 0.7% |
| 350 | 7536 | 0.7% |
| 1200 | 7374 | 0.7% |
| 190 | 7352 | 0.7% |
| 180 | 6594 | 0.6% |
| 90 | 6594 | 0.6% |
| 330 | 6410 | 0.6% |
| 150 | 6226 | 0.6% |
| 2640 | 5652 | 0.6% |
| Other values (644) | 942173 |
| Value | Count | Frequency (%) |
| 20 | 942 | 0.1% |
| 30 | 3767 | |
| 40 | 4710 | |
| 50 | 7536 | |
| 60 | 2826 | 0.3% |
| 70 | 4526 | |
| 80 | 2826 | 0.3% |
| 90 | 6594 | |
| 100 | 4710 | |
| 110 | 5468 |
| Value | Count | Frequency (%) |
| 75860 | 942 | |
| 58260 | 942 | |
| 48330 | 942 | |
| 46590 | 942 | |
| 45740 | 942 | |
| 44320 | 942 | |
| 40860 | 942 | |
| 40540 | 942 | |
| 38710 | 942 | |
| 38630 | 942 |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 323348 |
| Missing (%) | 31.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.222865963 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 8 |
| Q3 | 10 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.211832113 |
|---|---|
| Coefficient of variation (CV) | 0.4446755803 |
| Kurtosis | -1.248357036 |
| Mean | 7.222865963 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.1698616346 |
| Sum | 5011665 |
| Variance | 10.31586553 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 114254 | 11.2% |
| 4 | 87076 | 8.6% |
| 11 | 84455 | 8.3% |
| 3 | 63548 | 6.2% |
| 7 | 59434 | 5.8% |
| 12 | 57896 | 5.7% |
| 10 | 55622 | 5.5% |
| 6 | 45444 | 4.5% |
| 5 | 39608 | 3.9% |
| 2 | 37886 | 3.7% |
| Other values (2) | 48638 | 4.8% |
| (Missing) | 323348 |
| Value | Count | Frequency (%) |
| 1 | 12452 | 1.2% |
| 2 | 37886 | 3.7% |
| 3 | 63548 | |
| 4 | 87076 | |
| 5 | 39608 | 3.9% |
| 6 | 45444 | 4.5% |
| 7 | 59434 | |
| 8 | 36186 | 3.6% |
| 9 | 114254 | |
| 10 | 55622 |
| Value | Count | Frequency (%) |
| 12 | 57896 | |
| 11 | 84455 | |
| 10 | 55622 | |
| 9 | 114254 | |
| 8 | 36186 | 3.6% |
| 7 | 59434 | |
| 6 | 45444 | 4.5% |
| 5 | 39608 | 3.9% |
| 4 | 87076 | |
| 3 | 63548 |
| Distinct | 23 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 323348 |
| Missing (%) | 31.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2008.690228 |
| Minimum | 1900 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 2001 |
| Q1 | 2006 |
| median | 2010 |
| Q3 | 2013 |
| 95-th percentile | 2015 |
| Maximum | 2015 |
| Range | 115 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 5.992644444 |
|---|---|
| Coefficient of variation (CV) | 0.002983359187 |
| Kurtosis | 121.934675 |
| Mean | 2008.690228 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -7.539514879 |
| Sum | 1393751810 |
| Variance | 35.91178743 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2013 | 75426 | 7.4% |
| 2012 | 74299 | 7.3% |
| 2014 | 63732 | 6.3% |
| 2005 | 56564 | 5.6% |
| 2010 | 51258 | 5.0% |
| 2011 | 49396 | 4.9% |
| 2009 | 49396 | 4.9% |
| 2008 | 48476 | 4.8% |
| 2007 | 43744 | 4.3% |
| 2006 | 42802 | 4.2% |
| Other values (13) | 138768 | |
| (Missing) | 323348 |
| Value | Count | Frequency (%) |
| 1900 | 758 | 0.1% |
| 1961 | 942 | 0.1% |
| 1990 | 4710 | 0.5% |
| 1994 | 1884 | 0.2% |
| 1995 | 1700 | 0.2% |
| 1998 | 942 | 0.1% |
| 1999 | 7352 | 0.7% |
| 2000 | 9236 | 0.9% |
| 2001 | 14704 | |
| 2002 | 24882 |
| Value | Count | Frequency (%) |
| 2015 | 35060 | |
| 2014 | 63732 | |
| 2013 | 75426 | |
| 2012 | 74299 | |
| 2011 | 49396 | |
| 2010 | 51258 | |
| 2009 | 49396 | |
| 2008 | 48476 | |
| 2007 | 43744 | |
| 2006 | 42802 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1017209 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1017209 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1017209 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1017209 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 509178 | |
| 0 | 508031 |
| Distinct | 24 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 508031 |
| Missing (%) | 49.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.26909254 |
| Minimum | 1 |
|---|---|
| Maximum | 50 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 13 |
| median | 22 |
| Q3 | 37 |
| 95-th percentile | 45 |
| Maximum | 50 |
| Range | 49 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 14.09597253 |
|---|---|
| Coefficient of variation (CV) | 0.6057809305 |
| Kurtosis | -1.369928605 |
| Mean | 23.26909254 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 0.1045275226 |
| Sum | 11848110 |
| Variance | 198.6964415 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 14 | 72990 | 7.2% |
| 40 | 62598 | 6.2% |
| 31 | 39976 | 3.9% |
| 10 | 38828 | 3.8% |
| 5 | 35818 | 3.5% |
| 37 | 32786 | 3.2% |
| 1 | 32418 | 3.2% |
| 13 | 29820 | 2.9% |
| 45 | 29268 | 2.9% |
| 22 | 28694 | 2.8% |
| Other values (14) | 105982 | 10.4% |
| (Missing) | 508031 |
| Value | Count | Frequency (%) |
| 1 | 32418 | |
| 5 | 35818 | |
| 6 | 942 | 0.1% |
| 9 | 12452 | 1.2% |
| 10 | 38828 | |
| 13 | 29820 | |
| 14 | 72990 | |
| 18 | 27318 | 2.7% |
| 22 | 28694 | 2.8% |
| 23 | 4342 | 0.4% |
| Value | Count | Frequency (%) |
| 50 | 942 | 0.1% |
| 49 | 758 | 0.1% |
| 48 | 8294 | 0.8% |
| 45 | 29268 | |
| 44 | 2642 | 0.3% |
| 40 | 62598 | |
| 39 | 4732 | 0.5% |
| 37 | 32786 | |
| 36 | 9236 | 0.9% |
| 35 | 22814 | 2.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 508031 |
| Missing (%) | 49.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2011.752774 |
| Minimum | 2009 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 2009 |
|---|---|
| 5-th percentile | 2009 |
| Q1 | 2011 |
| median | 2012 |
| Q3 | 2013 |
| 95-th percentile | 2014 |
| Maximum | 2015 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.662870431 |
|---|---|
| Coefficient of variation (CV) | 0.0008265779235 |
| Kurtosis | -1.04066228 |
| Mean | 2011.752774 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.1200599167 |
| Sum | 1024340254 |
| Variance | 2.765138069 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2011 | 115056 | 11.3% |
| 2013 | 110464 | 10.9% |
| 2014 | 79922 | 7.9% |
| 2012 | 73174 | 7.2% |
| 2009 | 65270 | 6.4% |
| 2010 | 56240 | 5.5% |
| 2015 | 9052 | 0.9% |
| (Missing) | 508031 |
| Value | Count | Frequency (%) |
| 2009 | 65270 | |
| 2010 | 56240 | |
| 2011 | 115056 | |
| 2012 | 73174 | |
| 2013 | 110464 | |
| 2014 | 79922 | |
| 2015 | 9052 | 0.9% |
| Value | Count | Frequency (%) |
| 2015 | 9052 | 0.9% |
| 2014 | 79922 | |
| 2013 | 110464 | |
| 2012 | 73174 | |
| 2011 | 115056 | |
| 2010 | 56240 | |
| 2009 | 65270 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 508031 |
| Missing (%) | 49.9% |
| Memory size | 7.8 MiB |
| Jan,Apr,Jul,Oct | |
|---|---|
| Feb,May,Aug,Nov | |
| Mar,Jun,Sept,Dec |
Length
| Max length | 16 |
|---|---|
| Median length | 15 |
| Mean length | 15.19140654 |
| Min length | 15 |
Characters and Unicode
| Total characters | 7735130 |
|---|---|
| Distinct characters | 23 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Jan,Apr,Jul,Oct |
|---|---|
| 2nd row | Jan,Apr,Jul,Oct |
| 3rd row | Jan,Apr,Jul,Oct |
| 4th row | Jan,Apr,Jul,Oct |
| 5th row | Jan,Apr,Jul,Oct |
Common Values
| Value | Count | Frequency (%) |
| Jan,Apr,Jul,Oct | 293122 | |
| Feb,May,Aug,Nov | 118596 | 11.7% |
| Mar,Jun,Sept,Dec | 97460 | 9.6% |
| (Missing) | 508031 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| jan,apr,jul,oct | 293122 | |
| feb,may,aug,nov | 118596 | |
| mar,jun,sept,dec | 97460 | 19.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| , | 1527534 | |
| J | 683704 | 8.8% |
| u | 509178 | 6.6% |
| a | 509178 | 6.6% |
| A | 411718 | 5.3% |
| c | 390582 | 5.0% |
| t | 390582 | 5.0% |
| r | 390582 | 5.0% |
| p | 390582 | 5.0% |
| n | 390582 | 5.0% |
| Other values (13) | 2140908 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4170884 | |
| Uppercase Letter | 2036712 | |
| Other Punctuation | 1527534 | 19.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| u | 509178 | |
| a | 509178 | |
| c | 390582 | |
| t | 390582 | |
| r | 390582 | |
| p | 390582 | |
| n | 390582 | |
| e | 313516 | |
| l | 293122 | |
| b | 118596 | 2.8% |
| Other values (4) | 474384 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 683704 | |
| A | 411718 | |
| O | 293122 | |
| M | 216056 | 10.6% |
| F | 118596 | 5.8% |
| N | 118596 | 5.8% |
| S | 97460 | 4.8% |
| D | 97460 | 4.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 1527534 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6207596 | |
| Common | 1527534 | 19.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| J | 683704 | 11.0% |
| u | 509178 | 8.2% |
| a | 509178 | 8.2% |
| A | 411718 | 6.6% |
| c | 390582 | 6.3% |
| t | 390582 | 6.3% |
| r | 390582 | 6.3% |
| p | 390582 | 6.3% |
| n | 390582 | 6.3% |
| e | 313516 | 5.1% |
| Other values (12) | 1827392 |
Common
| Value | Count | Frequency (%) |
| , | 1527534 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7735130 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| , | 1527534 | |
| J | 683704 | 8.8% |
| u | 509178 | 6.6% |
| a | 509178 | 6.6% |
| A | 411718 | 5.3% |
| c | 390582 | 5.0% |
| t | 390582 | 5.0% |
| r | 390582 | 5.0% |
| p | 390582 | 5.0% |
| n | 390582 | 5.0% |
| Other values (13) | 2140908 |
Year
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 2013 | |
|---|---|
| 2014 | |
| 2015 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 4068836 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2015 |
|---|---|
| 2nd row | 2015 |
| 3rd row | 2015 |
| 4th row | 2015 |
| 5th row | 2015 |
Common Values
| Value | Count | Frequency (%) |
| 2013 | 406974 | |
| 2014 | 373855 | |
| 2015 | 236380 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2013 | 406974 | |
| 2014 | 373855 | |
| 2015 | 236380 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 1017209 | |
| 0 | 1017209 | |
| 1 | 1017209 | |
| 3 | 406974 | |
| 4 | 373855 | 9.2% |
| 5 | 236380 | 5.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4068836 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 1017209 | |
| 0 | 1017209 | |
| 1 | 1017209 | |
| 3 | 406974 | |
| 4 | 373855 | 9.2% |
| 5 | 236380 | 5.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4068836 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 1017209 | |
| 0 | 1017209 | |
| 1 | 1017209 | |
| 3 | 406974 | |
| 4 | 373855 | 9.2% |
| 5 | 236380 | 5.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4068836 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 1017209 | |
| 0 | 1017209 | |
| 1 | 1017209 | |
| 3 | 406974 | |
| 4 | 373855 | 9.2% |
| 5 | 236380 | 5.8% |
Month
Real number (ℝ≥0)
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.846762072 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 8 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.326096562 |
|---|---|
| Coefficient of variation (CV) | 0.5688783845 |
| Kurtosis | -1.017876008 |
| Mean | 5.846762072 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.2742016429 |
| Sum | 5947379 |
| Variance | 11.06291834 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 103695 | |
| 3 | 103695 | |
| 1 | 103694 | |
| 6 | 100350 | |
| 4 | 100350 | |
| 7 | 98115 | |
| 2 | 93660 | |
| 12 | 63550 | |
| 10 | 63550 | |
| 8 | 63550 | |
| Other values (2) | 123000 |
| Value | Count | Frequency (%) |
| 1 | 103694 | |
| 2 | 93660 | |
| 3 | 103695 | |
| 4 | 100350 | |
| 5 | 103695 | |
| 6 | 100350 | |
| 7 | 98115 | |
| 8 | 63550 | |
| 9 | 61500 | |
| 10 | 63550 |
| Value | Count | Frequency (%) |
| 12 | 63550 | |
| 11 | 61500 | |
| 10 | 63550 | |
| 9 | 61500 | |
| 8 | 63550 | |
| 7 | 98115 | |
| 6 | 100350 | |
| 5 | 103695 | |
| 4 | 100350 | |
| 3 | 103695 |
Day
Real number (ℝ≥0)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.70278969 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 16 |
| Q3 | 23 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.787637613 |
|---|---|
| Coefficient of variation (CV) | 0.5596227031 |
| Kurtosis | -1.192005785 |
| Mean | 15.70278969 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.008454085266 |
| Sum | 15973019 |
| Variance | 77.22257483 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16 | 33485 | 3.3% |
| 17 | 33485 | 3.3% |
| 6 | 33485 | 3.3% |
| 7 | 33485 | 3.3% |
| 8 | 33485 | 3.3% |
| 9 | 33485 | 3.3% |
| 10 | 33485 | 3.3% |
| 11 | 33485 | 3.3% |
| 12 | 33485 | 3.3% |
| 13 | 33485 | 3.3% |
| Other values (21) | 682359 |
| Value | Count | Frequency (%) |
| 1 | 33484 | |
| 2 | 33485 | |
| 3 | 33485 | |
| 4 | 33485 | |
| 5 | 33485 | |
| 6 | 33485 | |
| 7 | 33485 | |
| 8 | 33485 | |
| 9 | 33485 | |
| 10 | 33485 |
| Value | Count | Frequency (%) |
| 31 | 19350 | |
| 30 | 30140 | |
| 29 | 30140 | |
| 28 | 33485 | |
| 27 | 33485 | |
| 26 | 33485 | |
| 25 | 33485 | |
| 24 | 33485 | |
| 23 | 33485 | |
| 22 | 33485 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Store | DayOfWeek | Date | Sales | Customers | Open | Promo | StateHoliday | SchoolHoliday | StoreType | Assortment | CompetitionDistance | CompetitionOpenSinceMonth | CompetitionOpenSinceYear | Promo2 | Promo2SinceWeek | Promo2SinceYear | PromoInterval | Year | Month | Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 4 | 2015-07-31 | 5263 | 555 | 1 | 1 | 0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 31 |
| 1 | 1 | 3 | 2015-07-30 | 5020 | 546 | 1 | 1 | 0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 30 |
| 2 | 1 | 2 | 2015-07-29 | 4782 | 523 | 1 | 1 | 0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 29 |
| 3 | 1 | 1 | 2015-07-28 | 5011 | 560 | 1 | 1 | 0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 28 |
| 4 | 1 | 0 | 2015-07-27 | 6102 | 612 | 1 | 1 | 0 | 1 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 27 |
| 5 | 1 | 6 | 2015-07-26 | 0 | 0 | 0 | 0 | 0 | 0 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 26 |
| 6 | 1 | 5 | 2015-07-25 | 4364 | 500 | 1 | 0 | 0 | 0 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 25 |
| 7 | 1 | 4 | 2015-07-24 | 3706 | 459 | 1 | 0 | 0 | 0 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 24 |
| 8 | 1 | 3 | 2015-07-23 | 3769 | 503 | 1 | 0 | 0 | 0 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 23 |
| 9 | 1 | 2 | 2015-07-22 | 3464 | 463 | 1 | 0 | 0 | 0 | c | a | 1270.0 | 9.0 | 2008.0 | 0 | NaN | NaN | NaN | 2015 | 7 | 22 |
Last rows
| Store | DayOfWeek | Date | Sales | Customers | Open | Promo | StateHoliday | SchoolHoliday | StoreType | Assortment | CompetitionDistance | CompetitionOpenSinceMonth | CompetitionOpenSinceYear | Promo2 | Promo2SinceWeek | Promo2SinceYear | PromoInterval | Year | Month | Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1017199 | 1115 | 3 | 2013-01-10 | 5007 | 339 | 1 | 1 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 10 |
| 1017200 | 1115 | 2 | 2013-01-09 | 4649 | 324 | 1 | 1 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 9 |
| 1017201 | 1115 | 1 | 2013-01-08 | 5243 | 341 | 1 | 1 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 8 |
| 1017202 | 1115 | 0 | 2013-01-07 | 6905 | 471 | 1 | 1 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 7 |
| 1017203 | 1115 | 6 | 2013-01-06 | 0 | 0 | 0 | 0 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 6 |
| 1017204 | 1115 | 5 | 2013-01-05 | 4771 | 339 | 1 | 0 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 5 |
| 1017205 | 1115 | 4 | 2013-01-04 | 4540 | 326 | 1 | 0 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 4 |
| 1017206 | 1115 | 3 | 2013-01-03 | 4297 | 300 | 1 | 0 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 3 |
| 1017207 | 1115 | 2 | 2013-01-02 | 3697 | 305 | 1 | 0 | 0 | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 2 |
| 1017208 | 1115 | 1 | 2013-01-01 | 0 | 0 | 0 | 0 | a | 1 | d | c | 5350.0 | NaN | NaN | 1 | 22.0 | 2012.0 | Mar,Jun,Sept,Dec | 2013 | 1 | 1 |